Eigen: a Spectral Approach to the Integration of Functional Genomics Annotations for Both Coding and Noncoding Sequence Variants
نویسندگان
چکیده
Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequence. Indeed, for any genetic variant, whether protein coding or noncoding, a diverse set of functional annotations is available from projects such as Ensembl, ENCODE and Roadmap Epigenomics. Such annotations can play a critical role in identifying putatively causal variants among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers, and their diversity. In particular, it is not clear a priori which annotation is better at predicting functionally relevant variants. It is therefore desirable to integrate these different annotations into a single measure of functional importance for a variant. Here we develop an unsupervised approach to derive such a meta-score (Eigen), that, unlike most existing methods, is not based on any labelled training data. Furthermore, the proposed method produces estimates of predictive accuracy for each functional annotation score, and subsequently uses these estimates of accuracy to derive the aggregate functional score for variants of interest as a weighted linear combination of individual annotations. We show that the resulting meta-score has better discriminatory ability using disease associated and putatively benign variants from published studies (for both Mendelian and complex diseases) compared with the recently proposed CADD score. In particular, we show that the proposed meta-score outperforms the CADD score on noncoding variants from GWAS and eQTL studies, noncoding somatic mutations in the COSMIC database, and on de novo coding mutations in epilepsy and autism studies. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.
منابع مشابه
Interpreting noncoding genetic variation in complex traits and human disease
Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has focused primarily on protein-coding variants, owing to the difficulty of interpreting noncoding mutations. This picture has changed with advances in the systematic annotation of functional noncoding elements. Evolutionary conservation, functional genomics, chromatin state, se...
متن کاملAdaptive Spectral Separation Two Layer Coding with Error Concealment for Cell Loss Resilience
This paper addresses the issue of cell loss and its consequent effect on video quality in a packet video system, and examines possible compensative measures. In the system's enconder, adaptive spectral separation is used to develop a two-layer coding scheme comprising a high priority layer to carry essential video data and a low priority layer with data to enhance the video image. A two-step er...
متن کاملConserved introns reveal novel transcripts in Drosophila melanogaster.
Noncoding RNAs that are-like mRNAs-spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usu...
متن کاملLARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations
In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotati...
متن کاملO-31: AMH and AMHR2 Genetic Variants in Chinese Women with Primary Ovarian Insufficiency and Normal Age at Natural Menopause
Background To investigate the role of the anti-Müllerian hormone (AMH) signalling pathway in the pathophysiology of idiopathic primary ovarian insufficiency (POI) and age at natural menopause (ANM) using a genetic approach MaterialsAndMethods DNA sequencing was used to detect the genotype distribution and allele frequency of the genes AMH and AMH receptor II (AMHR2) in 120 cases of idiopathic P...
متن کامل